Solving Multi-Objective MDP with Lexicographic Preference: An application to stochastic planning with multiple quantile objective

نویسندگان

Yan Li

Zhaohan Sun

چکیده

In most common settings of Markov Decision Process (MDP), an agent evaluate a policy based on expectation of (discounted) sum of rewards. However in many applications this criterion might not be suitable from two perspective: first, in risk aversion situation expectation of accumulated rewards is not robust enough, this is the case when distribution of accumulated reward is heavily skewed; another issue is that many applications naturally take several objective into consideration when evaluating a policy, for instance in autonomous driving an agent needs to balance speed and safety when choosing appropriate decision. In this paper, we consider evaluating a policy based on a sequence of quantiles it induces on a set of target states, our idea is to reformulate the original problem into a multi-objective MDP problem with lexicographic preference naturally defined. For computation of finding an optimal policy, we proposed an algorithm FLMDP that could solve general multi-objective MDP with lexicographic reward preference.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constrained consumable resource allocation in alternative stochastic networks via multi-objective decision making

Many real projects complete through the realization of one and only one path of various possible network paths. Here, these networks are called alternative stochastic networks (ASNs). It is supposed that the nodes of considered network are probabilistic with exclusive-or receiver and exclusive-or emitter. First, an analytical approach is proposed to simplify the structure of t...

متن کامل

A multiple objective approach for joint ordering and pricing planning problem with stochastic lead times

The integration of marketing and demand with logistics and inventories (supply side of companies) may cause multiple improvements; it can revolutionize the management of the revenue of rental companies, hotels, and airlines. In this paper, we develop a multi-objective pricing-inventory model for a retailer. Maximizing the retailer's profit and the service level are the objectives, and shorta...

متن کامل

Solving matrix games with hesitant fuzzy pay-offs

The objective of this paper is to develop matrix games with pay-offs of triangular hesitant fuzzy elements (THFEs). To solve such games, a new methodology has been derived based on the notion of weighted average operator and score function of THFEs. Firstly, we formulate two non-linear programming problems with THFEs. Then applying the score function of THFEs, we transform these two problems in...

متن کامل

A New Method For Solving Linear Bilevel Multi-Objective Multi-Follower Programming Problem

Linear bilevel programming is a decision making problem with a two-level decentralized organization. The leader is in the upper level and the follower, in the lower level. This study addresses linear bilevel multi-objective multi-follower programming (LB-MOMFP) problem, a special case of linear bilevel programming problems with one leader and multiple followers where each decision maker has sev...

متن کامل

Solving Critical Path Problem in Project Network by a New Enhanced Multi-objective Optimization of Simple Ratio Analysis Approach with Interval Type-2 Fuzzy Sets

Decision making is an important issue in business and project management that assists finding the optimal alternative from a number of feasible alternatives. Decision making requires adequate consideration of uncertainty in projects. In this paper, in order to address uncertainty of project environments, interval type-2 fuzzy sets (IT2FSs) are used. In other words, the rating of each alternativ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1705.03597 شماره

صفحات -

تاریخ انتشار 2017

Solving Multi-Objective MDP with Lexicographic Preference: An application to stochastic planning with multiple quantile objective

نویسندگان

چکیده

منابع مشابه

Constrained consumable resource allocation in alternative stochastic networks via multi-objective decision making

A multiple objective approach for joint ordering and pricing planning problem with stochastic lead times

Solving matrix games with hesitant fuzzy pay-offs

A New Method For Solving Linear Bilevel Multi-Objective Multi-Follower Programming Problem

Solving Critical Path Problem in Project Network by a New Enhanced Multi-objective Optimization of Simple Ratio Analysis Approach with Interval Type-2 Fuzzy Sets

عنوان ژورنال:

اشتراک گذاری